label model
Statistical Analysis of an Adversarial Bayesian Weak Supervision Method
Programmatic Weak Supervision (PWS) aims to reduce the cost of constructing large high quality labeled datasets often used in training modern machine learning models. A major component of the PWS pipeline is the label model, which amalgamates predictions from multiple noisy weak supervision sources, i.e. labeling functions (LFs), to label datapoints. While most label models are either probabilistic or adversarial, a recently proposed label model achieves strong empirical performance without falling into either camp. That label model constructs a polytope of plausible labelings using the LF predictions and outputs the center of that polytope as its proposed labeling. In this paper, we attempt to theoretically study that strategy by proposing Bayesian Balsubramani-Freund (BBF), a label model that implicitly constructs a polytope of plausible labelings and selects a labeling in its interior. We show an assortment of statistical results for BBF: log-concavity of its posterior, its form of solution, consistency, and rates of convergence. Extensive experiments compare our proposed method against twelve baseline label models over eleven datasets. BBF compares favorably to other Bayesian label models and label models that don't use datapoint features -- matching or exceeding their performance on eight out of eleven datasets.
ALabel model and illustrations
A.1 Majority Voting The Majority Voting (MV) is the most intuitive algorithm for aggregate LFs' annotations. We omit this case for simplicity. A.3 Snorkel MeTaL The parameters ยตof Snorkel MeTaL [31] are given by Bayes' theorem we have: pยต(y = c,ฮป = m) = pยต(ฮป = m | y = c)p(y = c) = Consider a label model g(L(x),x) F in arbitrary functional class F, e.g., neural network, and having additional dependency on data feature x4, we can still approximate such complicated function with identity function-based label model g W(x)(L(x)) similar to the aforementioned one except that W(x): X RM (C+1) C is a similarly complicated function, e.g., neural network, that maps each data x X to a unique label model parameter W(x). We leave the exploration of more complicated form of label models into future work. B.1 Case 1: identity function We define the loss with reweighted sample as, Instead of employing the decomposing loss function, we introduce a more general influence estimation method - weight-moving Influence, which get ride of the loss decomposition and approximation and is agnostic to the selection of ฯ() function.
Understanding Programmatic Weak Supervision via Source-aware Influence Function
Programmatic Weak Supervision (PWS) aggregates the source votes of multiple weak supervision sources into probabilistic training labels, which are in turn used to train an end model. With its increasing popularity, it is critical to have some tool for users to understand the influence of each component (e.g., the source vote or training data) in the pipeline and interpret the end model behavior. To achieve this, we build on Influence Function (IF) and propose source-aware IF2, which leverages the generation process of the probabilistic labels to decompose the end model's training objective and then calculate the influence associated with each (data, source, class) tuple. These primitive influence score can then be used to estimate the influence of individual component of PWS, such as source vote, supervision source, and training data. On datasets of diverse domains, we demonstrate multiple use cases: (1) interpreting incorrect predictions from multiple angles that reveals insights for debugging the PWS pipeline, (2) identifying mislabeling of sources with a gain of 9%-37% over baselines, and (3) improving the end model's generalization performance by removing harmful components in the training objective (13%-24% better than ordinary IF).
Learnability with Partial Labels and Adaptive Nearest Neighbors
Errandonea, Nicolas A., Mazuelas, Santiago, Lozano, Jose A., Dasgupta, Sanjoy
Prior work on partial labels learning (PLL) has shown that learning is possible even when each instance is associated with a bag of labels, rather than a single accurate but costly label. However, the necessary conditions for learning with partial labels remain unclear, and existing PLL methods are effective only in specific scenarios. In this work, we mathematically characterize the settings in which PLL is feasible. In addition, we present PL A-$k$NN, an adaptive nearest-neighbors algorithm for PLL that is effective in general scenarios and enjoys strong performance guarantees. Experimental results corroborate that PL A-$k$NN can outperform state-of-the-art methods in general PLL scenarios.
Mitigating Source Bias for Fairer Weak Supervision
Theoretically, we show that it is possible for our approach to simultaneously improve both accuracy and fairness--in contrast to standard fairness approaches that suffer from tradeoffs. Empirically, we show that our technique improves accuracy on weak supervision baselines by as much as 32% while reducing demographic parity gap by 82.5%.